Predicting False Positives of Protein-Protein Interaction Data by Semantic Similarity Measures
نویسندگان
چکیده
Recent technical advances in identifying protein-protein interactions (PPIs) have generated the genomic-wide interaction data, collectively collectively referred to as the interactome. These interaction data give an insight into the underlying mechanisms of biological processes. However, the PPI data determined by experimental and computational methods include an extremely large number of false positives which are not confirmed to occur in vivo. Filtering PPI data is thus a critical preprocessing step to improve analysis accuracy. Integrating Gene Ontology (GO) data is proposed in this article to assess reliability of the PPIs. We evaluate the performance of various semantic similarity measures in terms of functional consistency. Protein pairs with high semantic similarity are considered highly likely to share common functions, and therefore, are more likely to interact. We also propose a combined method of semantic similarity to apply to predicting false positive PPIs. The experimental results show that the combined hybrid method has better performance than the individual semantic similarity classifiers. The proposed classifier predicted that 58.6% of the S. cerevisiae PPIs from the BioGRID database are false positives.
منابع مشابه
Integrative approaches for predicting protein function and prioritizing genes for complex phenotypes using protein interaction networks
With the rapid development of biotechnologies, many types of biological data including molecular networks are now available. However, to obtain a more complete understanding of a biological system, the integration of molecular networks with other data, such as molecular sequences, protein domains and gene expression profiles, is needed. A key to the use of networks in biological studies is the ...
متن کاملEstimating support for protein-protein interaction data with applications to function prediction.
Almost every cellular process requires the interactions of pairs or larger complexes of proteins. High throughput protein-protein interaction (PPI) data have been generated using techniques such as the yeast two-hybrid systems, mass spectrometry method, and many more. Such data provide us with a new perspective to predict protein functions and to generate protein-protein interaction networks, a...
متن کاملApplications of semantic similarity measures
There has been much interest in uncovering protein-protein interactions and their underlying domain-domain interactions. Many experimental techniques have been developed, for example yeast-two-hybrid screening and tandem affinity purification. Since it is time consuming and expensive to perform exhaustive experimental screens, in silico methods are used for predicting interactions. However, all...
متن کاملiProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations
PROSITE database contains a set of entries corresponding to protein families, which are used to identify the family of a protein from its sequence. Although patterns and profiles are developed to be very selective, each may have false positive or negative hits. Considering false positives as items that reduce the selectiveness of a pattern, then, the more selective pattern we have, a more accur...
متن کاملKernel methods for predicting protein-protein interactions
MOTIVATION Despite advances in high-throughput methods for discovering protein-protein interactions, the interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions. RESULTS We present a kernel method for predicting protein-protein interactions usin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013